Indexing and classification of TV news articles based on speech dictation using word bigram
نویسندگان
چکیده
In order to construct a news database with a function of video on demand (VOD), it is required to classify news articles into topics. In this paper, we propose a method to automatically index and classify TV news articles into 10 topics based on a speech dictation techniques using speaker independent triphone HMMs and word bigram.
منابع مشابه
Message-driven speech recognition and topic-word extraction
This paper proposes a new formulation for speech recognition/understanding systems, in which the a posteriori probability of a speaker’s message that the speaker intend to address given an observed acoustic sequence is maximized. This is an extension of the current criterion that maximizes a probability of a word sequence. Among the various possible representations, we employ cooccurrence score...
متن کاملPerformance evaluation of word phrase and noun category language models for broadcast news speech recognition
This paper reports our work to improve a bigram language model for Japanese TV broadcast news speech recognition. First, frequent word strings were grouped into phrases in order that the phrases were added to the lexicon as new units of recognition. The test set perplexity was improved when frequent function word strings were used as additional recognition units. The speech recognition performa...
متن کاملSpeaker Indexing for News Articles, Debates and Drama in Broadcasted TV Programs
In this paper, we propose a method to extract and verify individual speaker utterance using a subspace method. This method can extract speech section of the same speaker by repeating speaker verification between the present speech section and the immediately previous speech section. The speaker models are automatically trained in the verification process without constructing speaker templates i...
متن کاملReal time speaker indexing based on subspace method - application to TV news articles and debate
In this paper, we propose a method to extract and verify individual speaker utterance using a subspace method. This method can extract speech section of the same speaker by repeating speaker verification between the present speech section and the immediately previous speech section. The speaker models are automatically trained in the verification process without constructing speaker templates i...
متن کاملArabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کامل